Purpose

This script provides the infrastructure to process the raw seabird data files into a usable format for the Bay Study.

Method

Data outputs

The seabird instrument is a CTD (a file recording Conductivity, Temperature, and Depth data) represented as two types of data files, a .cnv and a .hex file. Both can be opened in a text editor and both contains data related to the cast. According to Jillian, the seabird has proprietary software that converts the .hex file into a .cnv file that contains the tabular data. Therefore, the script will only focus on the .cnv file going forward.

Reading in the data

Data is read via the read.ctd.sbe function as part of the oce package. This function is specific only to a Seabird CTD file output. Alternatively, the read.ctd function can be used if users are not sure what instrument the CTD file (has overhead code to check which instrument the CTD comes from). Although the .cnv file is a text file and can be simply read as such, the main advantages of the read.ctd.sbe file is two folds: 1) metadata is automatically populated and 2) pressure is calculated from the depth, condutivity, and temperature data. Since I have yet to explore the relationships between pressure, depth, conductivity, and temperature to confirm a linear relationship, it is simply easier to use the read.ctd.sbe function to access the pressure data.

# Specify the month, year, and total number of casts
# A cast is defined as a deployment of the Seabird into the water to collect data. Theoretically, there is one cast per station.

# Function to read in the seabird data
# Make sure you're connected to VPN if this is on the U drive
read_seabird <- function(year, month,
                         path = "U:\\LTM\\Bay Study\\SeaBird\\Data") {
  # Does the path exist?
  if (!file.exists(path)) stop("Make sure the file path exists.", call. = F)
  
  if (missing(year)) {
    message("Pick a year:")
    return(print(list.files(path)))
  }
  
  if (missing(month)) {
    message("Pick a month:")
    return(print(list.files(file.path(path, year))))
  }
  
  # List of files in the designated year and month
  files <- list.files(file.path(path, year, month), pattern = "*.cnv", full.names = TRUE)
  # Reading in all the files. Will return a list
  lapply(files, function(x) {
    # Pulling the cast name from the file name, which is the values between the month, year, and .cnv strings
    castName <- gsub(paste0(".*/", year, "/", month, "/", month, year, "(\\d+)\\.cnv"), "\\1", x)
    
    oce::read.ctd.sbe(x, station = castName)
  }) %>% 
    setNames(gsub(".*/(.*)\\.cnv", "\\1", files))
}

# Actually read it in
seabirdCtdJan23 <- read_seabird(2023, "January")
# Do this across ALL the casts in that file
seabirdJan23 <- lapply(seabirdCtdJan23, function(x) {
  slot(x, "data") %>% 
    data.frame() %>% 
    mutate(cast = x@metadata$station,
           startTime = force_tz(x@metadata$startTime, tzone = "America/Los_Angeles"),
           month = format(startTime, format = "%m"),
           depthBin = cut(depth, 
                          breaks = c(0, seq(0.7, ceiling(max(depth)) + 0.2, by = 0.5)),
                          labels = seq(0.5, ceiling(max(depth)), by = 0.5)))
})

Isolating the downcast portion of the data

The seabird is initially soaked at the top of the water surface before being cast down into the water. The seabird logs data throughout this period, but we would like to only grab data from when it is descending the water column. We can use a breakpoint analysis to find this segment during each cast. A breakpoint analysis involves fitting a model that fits several linear models across the data; the connections between these individual lines represents the breakpoints, from which we can leverage to isolate the downcast. The model trains on the pressure value and is given the freedom to iterate up to a max number of 5 breakpoints. This value, Kmax, can be changed but initial analysis shows that 5 works well.

library(segmented)

# Fit the model, extract the breakpoints and various fit metrics
breakpointsSelect <- lapply(seabirdJan23, 
                            function(x) {
                              
                              x <<- x
                              lmMod <- lm(pressure ~ scan, data = x)
                              
                              # Change Kmax if needed
                              fitSegmented <- tryCatch(selgmented(lmMod, type = "bic", Kmax = 5),
                                                       error = function(cond) {
                                                         warning(cond, 
                                                                 call. = F)
                                                       })
                              
                              rsme <- sqrt(mean((x$pressure - fitSegmented$fitted.values)^2))
                              breakpoints <- fitSegmented$psi[, 2]
                              
                              p <- function(y) {
                                plot(fitSegmented)
                                points(x[, c("scan", "pressure")])
                                abline(v = breakpoints, col = "#FF4D00")
                              }
                              
                              # Return this
                              list(data = x,
                                   mod = fitSegmented,
                                   rsme = rsme,
                                   breakpoints = breakpoints,
                                   plot = function(){p(fitSegmented)})
                            }) %>% 
  setNames(names(seabirdJan23))
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  1373.6898  1275.2490   192.2401   193.4804 -1027.4349 -1034.2194 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2332.7993  2109.5922   430.6863   433.3173   445.5128 -1293.3397 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  1380.7901  1276.2965   227.5757   224.9449 -1064.4610 -1784.4546 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2166.5495  2009.5870   368.1030 -1146.7781   386.8056 -1348.4399 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##  1045.68994   964.55163   969.62478   -38.44593   -26.91763 -1261.87201 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2097.8573  1934.6604   231.8123   237.9149 -1331.0200   256.7610 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  1529.9364  1419.1485   276.2517   272.2196 -1061.9336 -1060.5751 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2507.4056  2417.4226   281.4545   258.3791  -356.9328 -1066.2651 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##  1034.75201   955.18712   958.09052    62.40718    74.03101 -1017.84757 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 1137.2726 1050.1356  295.8374  300.8899 -969.0046 -958.5499 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2493.9580  2233.9680   305.4928  -652.2075  -928.7491 -1165.7546 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2022.1152  1863.8197   189.2811   184.2769 -1220.9249 -1777.0192 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##   730.1752   661.0903  -102.5939  -110.5507 -1300.6263         NA 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##            0            1            2            3            4            5 
##   750.862916   683.474222   682.466842   -15.000259 -1540.027803     7.744116 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##   822.01979   755.14803   757.20423   -18.23934 -1250.45841 -1242.14328 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  1649.0453  1512.1214   124.4223   121.4241   132.6003 -1437.2588 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  1887.9475  1730.5567   193.9997   180.3350 -1269.1369   202.8412 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  1944.7676  1778.6865   234.5148   228.4115 -1284.0124 -1279.2787 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2042.4269  1861.1223   556.6304   560.5748 -1434.1316 -1429.5565 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##   414.66581   311.52462   -72.52492  -973.72246   -65.98031 -1424.49333 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  897.53806  828.42145  831.25690   34.19534 -230.87001 -508.28721 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##   -97.37275  -464.02006  -720.20126 -1032.71643 -1026.94441 -1054.53568 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2137.6638  1908.7248   662.7625  -571.5509  -920.2182 -1065.1409 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##   537.8641   467.9012   467.9433  -131.7067 -1246.7035 -1300.8363 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
##  981.8082  897.8037  194.5997  188.5088 -469.3392 -462.2944 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 2482.9506 2228.0688  546.7809 -372.7311 -462.0744 -406.6882 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
##  679.7221  576.7691  164.6503  165.7044 -542.6680 -628.6545 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##        0        1        2        3        4        5 
## 740.2834 741.4615 683.2352 238.6401 166.0052 166.3922 
## No. of selected breakpoints:  4  
## breakpoint estimate(s): 10.07565 11.37635 270.8294 291.5222 293.9892 
## Error : at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)
## breakpoint estimate(s): 10.07565 11.37635 270.8294 291.5222 293.9892 
## Error : at least one coef is NA: breakpoint(s) at the boundary? (possibly with many x-values replicated)
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 813.14003 739.19249 -33.54109 -47.89093 -38.00787        NA 
## No. of selected breakpoints:  3  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 2483.0472 2259.0862  441.7947  442.1874 -451.1717 -456.5763 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##   888.73002   814.97564   816.91136   -33.50717 -1015.85128 -1004.57826 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##   929.0426   844.5407   120.4342  -751.5495 -1134.2032 -1125.1059 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 1985.8268 1789.3462  247.9881  240.6862 -816.6322 -804.9337 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 3749.5919 3240.5537  497.5725 -464.8167 -515.1256 -451.1259 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2179.2326  1985.4520   475.1805  -970.7560 -1175.2969 -1424.7996 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##   7067.036   6790.879   4431.347 -12214.312   4463.171   4478.972 
## No. of selected breakpoints:  3  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 2474.5381 2174.4660  641.1900 -194.4778 -288.3294 -314.8408 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 1823.2457 1646.6961  273.2282  -82.6773 -246.6227 -135.2775 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  996.42428  906.74877  -41.36237  -56.11420 -836.13649  -33.52033 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 3616.5770 3026.1138  191.6586 -490.6517 -606.2570 -552.4531 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##         0         1         2         3         4         5 
## 1151.1155 1029.9602  225.0398  209.2788  221.0290 -775.2246 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  929.04912  838.08368   65.31832   65.74085  -17.02793 -839.70432 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##   811.74474   737.74512   737.71833    51.49536 -1065.15146 -1060.25635 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2022.5389  1833.5710   305.4828   298.6070 -1103.4756 -1096.1259 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  570.79356  486.39334   18.91778   17.35034 -887.50450 -875.95990 
## No. of selected breakpoints:  4  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
## 2994.43738 2565.55541  605.14976  508.48020  298.84247   41.62799 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##  1162.58946  1012.74993    44.25341    21.26103 -1082.87466 -1159.32252 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2835.1940  2482.2167   480.5125   485.0955 -1090.9989 -1438.6539 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##           0           1           2           3           4           5 
##   962.29790   886.89491   890.92238   -90.34423 -1268.11459 -1288.25494 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2534.3008  2418.1016   261.0888   256.7558   267.8822 -1672.3627 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2323.3689  2098.5850   493.9556 -1084.3903 -1139.1639 -1464.1182 
## No. of selected breakpoints:  5  
## BIC to detect no. of breakpoints
## BIC values:
##          0          1          2          3          4          5 
##  2786.1751  2448.6982   341.9478 -1073.6856 -1246.2982 -1196.1912 
## No. of selected breakpoints:  4

This analysis yields several breakpoints per cast, and we must isolate the downcast manually.

# An example of the breakpoints fit
breakpointsSelect[[1]]$plot()

To do this, there are several filters that are used: these filters are preliminary and should be adjusted over time as this approach is adopted

  1. the max depth designates the end of the downcast (the first instance if there are multiple instances of the same max depth)
  2. the beginning of the downcast must be at least 10 scans before the max depth
  3. the difference between the depth of the scan at the beginning of the downcast with the depth of the scan 10 instances later is less than -1.

Data in between this beginning and ending scan are isolated as the downcast.

downcastBreakpoints <- lapply(breakpointsSelect,
       function(x) {
         x$data %>% 
           mutate(breakpoints = ifelse(row_number() %in% round(x$breakpoints), 
                                           scan, NA),
                      maxDepth = ifelse(depth == max(depth, na.rm = T), scan, NA),
                  fromMax = scan - max(maxDepth, na.rm = T),
                  slope = depth - lead(depth, 15),
                  bookEndDowncast = ifelse(!is.na(maxDepth) | (!is.na(breakpoints) & fromMax < -10 & slope < -1) &
                                             scan <= first(which(depth == max(depth, na.rm = T))),
                                           T, F)) %>% 
           filter(between(scan, first(which(bookEndDowncast == T)), last(which(bookEndDowncast == T))))
       })

We can quicky check to see if each cast has correctly isolated only the downcast (increasing depth):

downcastBreakpoints %>% 
  bind_rows(.id = "cast") %>% 
  ggplot(aes(scan, pressure)) +
  geom_line() +
  facet_wrap(~cast, scales = "free")

Binding each cast to the station

This requires an input file that lists the date, cast, time, and station taken from the seabird datasheet. This information is joined to the seabird data to assign the station to the downcast data.

stationCast <- read.csv(file.path("SeabirdLogJan-Mar2023.csv")) %>%
  group_by(Station) %>% 
  transmute(startTimeLog = as.POSIXct(paste0(Date, " ", ifelse(nchar(Time) == 3, paste0("0", Time), Time)), 
                                   format = "%m/%d/%Y %H%M"),
            month = format(as.Date(Date, format = "%m/%d/%Y"), format = "%m"),
            cast = Cast, Station, rep = 1:n(), Depth, Comments) %>% 
  right_join(downcastBreakpoints %>% 
               bind_rows(.id = "fileName") %>% 
               mutate(cast = as.integer(cast)),
             by = c("month", "cast")) %>% 
  group_by(cast = factor(cast), Station = factor(Station)) %>% 
  mutate(timeDifference = as.numeric(difftime(startTime, startTimeLog, units = "mins")))

Although the recorded cast number is generally correct, we can use the difference between start time recorded on the datasheet against on the seabird to make sure that we have correct matches.

distinct(stationCast, cast, rep, Station, timeDifference) %>% 
  ggplot(aes(factor(Station), timeDifference, color = factor(rep, levels = 1:3))) +
  geom_point(size = 5) +
  expand_limits(y = 0) +
  labs(x = "Station", y = "Time Difference (min)", color = "Rep") +
  theme_bw(base_size = 24) +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))